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H-l . Abstract 

Oh 

PRISM is an extension of Prolog with probabilistic predicates and built-in support for 
expectation-maximization learning. Constraint Handling Rules (CHR) is a high-level pro- 
gramming language based on multi-headed multiset rewrite rules. 

In this paper, we introduce a new probabilistic logic formalism, called CHRiSM, based 
on a combination of CHR and PRISM. It can be used for high-level rapid prototyping 
qq ' of complex statistical models by means of "chance rules" . The underlying PRISM system 

l/~) , can then be used for several probabilistic inference tasks, including probability compu- 

tation and parameter learning. We define the CHRiSM language in terms of syntax and 
operational semantics, and illustrate it with examples. We define the notion of ambigu- 
ous programs and define a distribution semantics for unambiguous programs. Next, we 
f— ^ ■ describe an implementation of CHRiSM, based on CHR(PRISM). We discuss the rela- 

tion between CHRiSM and other probabilistic logic programming languages, in particular 
PCHR. Finally, we identify potential application domains. 



^^ ■ 1 Introduction 

H , 

Constraint Handling Rules (Friihwirth 2009; Sncyc rs et al. 2010] ) is a high-level lan- 
guage extension based on multi-headed rules. Originally, CHR was designed as a 
special-purpose language to implement constraint solvers, but in recent years it has 
matured into a general purpose programming language. Being a language exten- 
sion, CHR is implemented on top of an existing programming language, which is 
called the host language. An implementation of CHR in host language X is called 
CHR(X). For instance, several CHR(Prolog) systems are available. 

PRISM (PRogramming In Statistical Modeling) is a probabilistic extension of 
Prolog (Sato 2008). It supports several probabilistic inference tasks, including sam- 
pling, probability computation, and expectation-maximization (EM) learning. 

In this paper, we construct a new formalism, called CHRiSM — short for CHance 
Rules induce Statistical Models. It is based on CHR(PRISM) and it combines the 
advantages of CHR and those of PRISM. Like CHR, CHRiSM is a very concise and 
expressive programming language. Like PRISM, CHRiSM has built-in support for 
several probabilistic inference tasks. Furthermore, since CHRiSM is implemented as 
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a translation to CHR(PRISM) — which itself is translated to PRISM and ultimately 
Prolog — CHRiSM rules can be freely mixed with CHR rules and Prolog clauses. 



This paper is based on an earlier workshop paper (Sneyers et al. 2009). Although 
it is mostly self-contained, some familiarity with CHR and PRISM is recommended. 

We use l±l for multiset union, <± for multiset subset, and 3aB to denote 3x\ , . . . , x n : 
B, with {xi, . . . ,x n } — vars(B) \ vars(A), where vars(A) are the (free) variables 
in A; if A is omitted it is empty (so 3B denotes the existential closure of B). 

2 Syntax and Semantics of CHRiSM 

In this section we define CHRiSM. The syntax is defined in Section fe.ll and the 
(abstract) operational semantics is defined in Section 12.21 Finally, in Section 12.31 
the notion of observations is introduced. 

2.1 Syntax and Informal Semantics 

A CHRiSM program V consists of a sequence of chance rules. Chance rules rewrite 
a multiset S of data elements, which are called (CHRiSM) constraints (mostly for 
historical reasons). Syntactically, a constraint c(Xi, . . ,X„) looks like a Prolog pred- 
icate: it has a functor c of some arity n and arguments Xi , . . ,X„ which are Prolog 
terms. The multiset § of constraints is called the constraint store or just store. The 
initial store is called the query or goal, the final store (obtained by exhaustive rule 
application) is called the answer or result. 

Chance rules. A chance rule is of the following form: 

P ?? Hk \ Hr <=> G I B. 

where P is a probability expression (as defined below), Hk is a conjunction of (kept 
head) constraints, Hr is a conjunction of (removed head) constraints, G is a guard 
condition (a Prolog goal to be satisfied), and B is the body of the rule. If Hk is 
empty, the rule is called a simplification rule and the backslash is omitted; if Hr 
is empty, the rule is called a propagation rule and it is written as "P ?? Hk ==> 
G I B" . If both Hk and Hr are non-empty, the rule is called a simpagation rule. 
The guard G is optional; if it is removed, the " I " is also removed. The body B 
is recursively defined as a conjunction of CHRiSM constraints, Prolog goals, and 
probabilistic disjunctions (as defined below) of bodies. 

Intuitively, the meaning of a chance rule is as follows: If the constraint store § 
contains elements that match with the head of the rule (i.e. if there is a matching 
substitution 9 such that (0(Hk) W 0(Hr)) S S), and furthermore, the guard G is 
satisfied, then we can consider rule application. The subset of S that corresponds to 
the head of the rule is called a rule instance. Depending on the probability expression 
P, the rule instance is either ignored or it actually leads to a rule application. Every 
rule instance may only be considered once. 

Rule application has the following effects: the constraints matching Hr are re- 
moved from the constraint store, and then the body B is executed, that is, Prolog 
goals are called and CHRiSM constraints are added into the store. 



CHR(PRISM)-based Probabilistic Logic Learning 3 

Probability expressions. A probability expression P is one of the following: 

• A number from to 1, indicating the probability that the rule fires. A rule 
of the form 1 ?? ... corresponds to a regular CHR rule; the "1 ??" may be 
dropped. A rule of the form ?? ... is never applied. 

• An expression of the form eval (E) , where E is an arithmetic expression (in 
Prolog syntax). It should be ground when the rule is considered (otherwise 
a runtime instantiation error occurs). The evaluated expression indicates the 
probability that the rule fires. 

• An experiment name. This is a Prolog term which should be ground when the 
rule is considered. The probability distribution is unknown. Initially, unknown 
probabilities are set to a uniform distribution (0.5 in the case of rule proba- 
bilities). They can be changed manually using PRISM's set_sw/2 builtin, or 
automatically using PRISM's EM-learning algorithm. 

The arguments of the experiment name can include conditions, which are of 
the form "cond C" . Such arguments are evaluated at runtime and replaced 
by either "y es " or "no", depending on whether call(C) succeeded or failed. 
These conditions are just syntactic sugar, so we may ignore them w.l.o.g. For 
example, the rule "foo(cond A>B) ?? c(A,B) <=> d" is syntactic sugar for 
"foo(X) ?? c(A,B) <=> (A>B -> X=yes ; X=no) I d". 

• Omitted (so the rule starts with "??"): this is a shorthand for a fresh zero- 
arity experiment name. 

Probabilistic disjunction. The body B of a CHRiSM rule may contain probabilistic 
disjunctions. There are two styles: 

• LPAD-style probabilistic disjunctions (Vcnnekc ns et al. 2004P of the form "Dl : PI 
; ... ; Dn:Pn" , where a disjunct Di is chosen with probability Pi. The prob- 
abilities should sum to 1 (otherwise a compile-time error occurs). 

• CHRiSM-style probabilistic disjunctions of the form "P ?? Dl ; ... ; Dn", 
where P is an experiment name determining the probability distribution. 

The LPAD-style probabilistic disjunctions can be seen as a special case of CHRiSM- 
style disjunctions for which the experiment name is implicit and the distribution 
is given and fixed. Unlike CHR V disjunctions, which create a choice point, both 
kinds of probabilistic disjunctions are committed-choice: once a disjunct is chosen, 
the choice is not undone later. However, when later on in a derivation the same 
experiment is sampled again, the result can of course be different. 

2.2 Operational Semantics 

The abstract operational semantics of a CHRiSM program V is given by a state- 
transition system that resembles the abstract operational semantics uj t of CHR 
QSneyers et al. 20"l0| ). The execution states are defined analogously, except that we 
additionally define a unique failed execution state which is denoted by u fait' (be- 
cause we don't want to distinguish between different failed states). We use the 
symbol ujj. ' to refer to the abstract operational semantics of CHRiSM. 
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1. Fail. <{6}tt)G,§,B,T) n >—. p fail 

where b is a built-in (Prolog) constraint and T>u \= ^3(B A b). 

2. Solve. ({&}WG,S,B,T)„ y~> {G,S,feAB,T) n 

where b is a built-in (Prolog) constraint and T>n \= 3(B A b). 

3. Introduce. ({c}aG,S,B,T)„ >-^ (G, {c#n} U S,B,T) n+ i 
where c is a CHRiSM constraint. 

4. Probabilistic-Choice, ({d} l+J G,S,B, T)„ >-~^ ({d t } l±l G,S,B,T)„ 

where d is a probabilistic disjunction of the form d\ :pi ; ... ; dk :pk or of the form 
P ?? rfi ; ... ; dk, where the probability distribution given by P assigns the prob- 
ability pi to the disjunct di. 

5. Maybe-Apply. (G, Hi W H% tfcl S, B, T) n 5; 3 5; 3 l—£ (G, Hi \& H 2 W §,B,TU {h}) n 

(G, ffiW^W S, B, T) n >-^> (B tti G, Hi i+J S, 6 A B, T U {h})„ 
where the r-th rule of V is of the form P ?? #{ \ H 2 <=> G I B, 
6 is a matching substitution such that chr(Hi) = 9(H[) and chr{H-i) — 8{H'2), 
h = (r J id(H x ),id{H 2 )) g T, and D w (= B ->• 3b(0 AG). If P is a number, then p = P. 
Otherwise p is the probability assigned to the success branch of P. 



Fig. 1. Transition relation > > of the abstract operational semantics wj' of 

CHRiSM. 

Definition 2.1 {identified constraint) 

An identified constraint c#i is a CHRiSM constraint c associated with some unique 
integer i. This number serves to differentiate between copies of the same constraint. 
We introduce the functions chr(c#i) = c and id(c#i) — i, and extend them to 
sequences and sets in the obvious manner, e.g., id{S) = {i|c#i G S}. 

Definition 2.2 (execution state) 

An execution state a is a tuple (G, S, B, T) n . The goal G is a multiset of constraints 
to be rewritten to solved form. The store S is a set of identified constraints that can 
be matched with rules in the program V . Note that chr(§>) is a multiset although 
§ is a set. The built-in store B is the conjunction of all Prolog goals that have been 
called so far. The history T is a set of tuples, each recording the identifiers of the 
CHRiSM constraints that fired a rule and the rule number. The history is used to 
prevent trivial non-termination: a rule instance is allowed to be considered only 
once. Finally, the counter nfN represents the next free identifier. 

We use cr, (To , <7i, . . . to denote execution states and £ CHR to denote the set of 
all execution states. We use V-u to denote the theory defining the host language 
(Prolog) built-ins and predicates used in the CHRiSM program. For a given CHR 

program V, the transitions are defined by the binary relation > > C £ CHR x £ CHR 

shown in Figure [TJ Every transition is annotated with a probability. 

Execution proceeds by exhaustively applying the transition rules, starting from 
an initial state (root) of the form (Q, 0, true, 0)o and performing a random walk in 

the directed acyclic graph defined by the transition relation > > , until a leaf node 

is reached, which is called a final state. We consider only terminating programs 
(finite transition graphs). Given a path from an initial state to the state a, we 
define the probability of a to be the product of the probabilities along the path. 
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We use (To > >* O], to denote a series of k > transitions 

Pi . P2 P3 Pfc 

^0 >— £► o-i ►— : ^> er 2 ^— ^ . . . >—, p a k 

where p = Yli=i Pi H k > and p — 1 otherwise. If oq is an initial state and o^ is a 
final state:, then wc: call such a series of transitions a derivation of probability p. We 
define a function prob to give the probability of a derivation: prob(ao > P v > * a k ) = p. 

Note that if all rule probabilities are 1 and the program contains no probabilistic 
disjunctions — i.e. if the CHRiSM program is actually just a regular CHR program 

then the col ■ semantics boils down to the u>t semantics of CHR. 



2.3 Full and Partial Observations 

A full observation Q <==> A denotes that there exist a series of probabilistic choices 
such that a derivation starting with query Q results in the answer A. A partial 
observation Q ===> A denotes that an answer for query Q contains at least A: in 
other words, Q ===> A holds iff Q <==> B with A ffi B. 

Definition 2.3 {observation) 

A full observation is of the form Q <==> A, where Q and A are conjunctions of 

constraints. Given a program V , a full observation refers to derivations of the form 

(Q,®,true,®) ^* <M',B,T) n -7^> 

such that A = chr(A'). A partial observation is of the form Q ===> A. It refers to 
derivations of the above form, such that A <± chr(A'). 

We also allow "negated" CHRiSM constraints in the right hand side: 
Q ===> A,~N is a shorthand for Q <==> B with A <± B and N % B \ A. 

The following PRISM built-ins can be used to query a CHRiSM program: 

• sample Q : probabilistically execute the query Q; 

• prob Q <==> A : compute the probability that Q <==> A holds, i.e. the chance 
that the choices are such that query Q results in answer A; 

• prob Q ===> A : compute the probability that an answer for Q contains A; 

• learn (L) : perform EM-learning based on a list L of observations 

In observation lists, the syntax "n times A" or "count (A, n)" can be used to de- 
note that observation A occurred n times. This is simply a shorthand for repeating 
the same observation (A) a number of times (n). 



3 Example programs 

As a first toy example, consider the following CHRiSM program for tossing a coin: 

toss <=> head: 0.5 ; tail: 0.5. 

The query "sample toss" results in "head" or "tail", with 50% chance each. The 
query "sample toss, toss" has four possible outcomes, each with 25% chance: 
"head, head", "head, tail", "tail, head", and "tail, tail". 
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■ paper(tom) 

O™) paper(jon) 



Fig. 2. A derivation tree for the rock-paper-scissors example. 

Note that observations are not sensitive to the order in which the result is given. 
As a result, the query "prob toss, toss <==> head, tail" returns a probability 
of 50%, because the outcome "tail, he ad" also matches the observation. 



3.1 Rock-paper- scissors 

Consider the following CHRiSM program simulating "rock-paper-scissors" players: 

player(P) <=> choice(P) ?? rock(P) ; scissors(P) ; paper (P) . 
rock(Pl) , scissors(P2) ==> winner(Pl) . 
scissors (PI) , paper(P2) ==> winner(Pl) . 
paper(Pl), rock(P2) ==> winner (PI) . 

We assume that each player has his own fixed probability distribution for choos- 
ing between rock, scissors, and paper. This is denoted by using choice (P) as the 
probability expression for the choice in the first rule: the probability distribution 
depends on the value of P and thus every player has his own distribution. However, 
these distributions are not known to us. By default, the unknown probability dis- 
tributions for, say, torn and jon are therefore both set to the uniform distribution, 
which implies, among other things, that each player should win one third of the 
time (cf. Figure [5]) . Here is a possible interaction: 

?- sample player (torn) , player (jon) 

player (torn) , player (jon) <==> rock(jon) ,rock(tom) . 

?- sample player (torn) , player (jon) 

player (torn) , player (jon) <==> rock (j on) , paper (torn) .winner (torn) . 

?- prob player (torn) , player (jon) ===> winner(tom) 

Probability of player (torn) , player (jon) ===>winner (torn) is: 0.333333 

Now suppose that we watch 100 games, and want to use our observations to 
obtain a better model of the playing style of both players. If we can fully observe 
these games, then this is easy: we can just use the frequency with which each player 
played rock, paper or scissors as an estimate for the probability of him making that 
particular move. The situation becomes more difficult, however, if the games are 
only partly observable. For instance, suppose that we do not know which moves the 
players made, but are only told the final scores: torn won 50 games, jon won 20, and 
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30 games were a tie. Deriving estimates for the probabilities of individual moves 
from this information is less straightforward. For this reason, PRISM comes with a 
built-in implementation of the EM-algorithm for performing parameter estimation 
in the presence of missing information ( |Kameya and Sato 2000 1 . We can use this 



algorithm to find plausible corresponding distributions: 

I ?- learn([ (50 times player (torn) .player (j on) ===> winner (torn) ) , 
(20 times player (torn) .player (jon) ===> winner (jon) ) , 
(30 times player (torn) .player (jon) ===> ^winner (torn) .^winner (jon) )] ) 

The PRISM built-in show_sw shows the learned probability distributions, which do 
indeed (approximately) lead to the observation frequencies, e.g.: 

I ?- show_sw 

Switch choice(jon): 1 (p: 0.60057) 2 (p: 0.06536) 3 (p: 0.33406) 

Switch choice(tom): 1 (p: 0.08420) 2 (p: 0.20973) 3 (p: 0.70605) 

I ?- prob player (torn) .player (jon) ===> winner(tom) 

Probability of player (torn) .player (jon) ===>winner (torn) is: 0.499604 

3.2 Random graphs 

Suppose we want to generate a random directed graph, given its nodes. The follow- 
ing rule generates every possible directed edge with probability 50%: 

0.5 ?? node(A), node(B) ==> edge(A.B) . 

The above rule generates dense graphs; if we want to get a sparse graph, say with 
an average (out-) degree of 3, we can use the following rule. The auxiliary constraint 
nb_nodes(n) contains the total number of nodes n; the probability of the rule is 
such that each of the n(n — 1) possible edges is generated with probability 3/(n— 1), 
so on average it generates 3n edges: 

eval(3/(N-l)) ?? nb_nodes(N), node(A), node(B) ==> edge(A.B) . 

3.3 Bayesian networks 

Bayesian networks are one of the most widely used kinds of probabilistic models. A 
classical example (Pearl 1988) of a Bayesian network is that describing the following 
alarm system. Suppose there is some probability that there is a burglary, and also 
that there is some probability that an earthquake happens. The probability that 
the alarm goes off depends on whether those events happen. Also, the probability 
that John calls the police depends on whether the alarm went off, and similarly for 
the probability that Mary calls. 
This Bayesian network can be described in CHRiSM in a straightforward way: 

go ==> ?? burglary(yes) ; burglary(no) . 

go ==> ?? earthquake (yes) ; earthquake (no) . 

burglary (B) , earthquake (E) ==> B.E ?? alarm(yes) ; alarm(no) . 

A ?? alarm (A) ==> johncalls. 

A ?? alarm(A) ==> marycalls . 
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The probability distributions can be estimated given full observations (e.g., go <==> 
go, burglary (no ) , earthquake (yes) , alarm (yes ) , mary call s .), or given par- 
tial observations (e.g., go ===> jolmcalls, ^marycalls . ). 

In this way, each Bayesian network can be represented in CHRiSM. We can derive 
the same information from it as can be derived from the network itself. 



4 Ambiguity and a Distribution Semantics for CHRiSM 

In addition to the very nondetcrministic abstract operational semantics w t • , we can 
also define more deterministic instantiations of wj ' , just like co r and lo p are instan- 
tiations of u>t (see also ( |Sneyers and Friihwir th 2008)). In the current implemen- 
tation of CHRiSM we use the "refined semantics of CHRiSM", defined analogously 
to (jDuck et al. 2004]) . Of course CHRiSM can also be given a "priority semantics" 
()De Koninck et al. 2007[) in order to get a more intuitive mechanism for execution 
control. 



4-1 Instantiations of u;j • 

Any CHRiSM system uses a (computable) execution strategy in the sense of ( |Sneyers~an d Friihwi rth 2008[ ). 

Note that in ( |Sneyers"a nd Friihwirth 2008[ ), an execution strategy completely fixes 

the derivation for a given input goal. In the context of CHRiSM this is no longer 

the case because of the probabilistic choices. However, we may assume that the 

derivation is fixed if the same choices are made. In other words, the only choice 

is in the probabilistic choices inside the transitions "Probabilistic Choice" and 

"Maybe- Apply" ; there is no nondeterminism in choosing which ojj ? transition to 

apply next. 

Definition J^.l {execution strategy) 

An execution strategy fixes the non-probabilistic choices during an wj ' derivation. 

Formally, — — ^> is an execution strategy for a program V if — — r> C > y and for 

every execution state a £ E CHR , the set 5* of all transitions of the form a — t^ 17 ' 
corresponds to at most one of the five types of transitions of u>l ' , that is, either 

• S = and no ujj. ' transition is applicable; 

• or S is a singleton corresponding to a Fail, Solve or Introduce transition; 

• or S is a set of transitions corresponding to the Probabilistic-Choice tran- 
sition for one specific disjunction; 

• or S* is a set of transitions corresponding to the Maybe- Apply transition for 
one specific rule instantiation. 

It follows from this definition that for non-final states a, the sum of the probabilities 
of all transitions from a is one under any execution strategy. We use o-q — t^>* °~k 
to denote a series of k > transitions 

Pl . Pi v P3 Pfc 

CT ° — J^ ai — T^ °" 2 — i^ ■ ■ ■ — 17^ ° k 
where p = Yii=iPi if fc > and p — 1 otherwise, as before. 
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Definition 4-2 (strategy class) 

A strategy class £l(V) is a set of execution strategies for V. The strategy class 

ill ? (V) is the set of all execution strategies for V '. 

4-2 Distribution Semantics 

Firstly, we define equivalence of execution states. We use a definition based on 
(|Raiser et al. 2009]) but adapted to our needs. Intuitively, we say two states are 
equivalent if the constraint stores are equal and the built-in stores are equivalent; 
we do not care about identifiers and propagation histories. 

Definition 4-3 (equivalent states) 

Equivalence between execution states is the smallest equivalence relation = s.t.: 

1. (G,S,x = tAB,T) n = (G,S[x/t],x = t AB,T)n' 

2. (G, S, x = t A B, T)„ = (G[x/t] , S, x = t A B, T) n , 

3. (G, S, B, T)„ = (G, S', B, T'}„> if chr(S)_ = chr{S') 

4. (G, S, B, T)„ = (G, S, B', T)„ if V n |= 3 G , S B o 3 G , S B' 

We now define the probability of getting some result (given an execution strategy) 
as the sum of the probabilities of ending up in a final state equivalent with it: 

Definition 4-4 (observation probability) 

Given a program V and an execution strategy — — ^> S fij ' (V), we write 

Ptot , 

if 07 is a final state and ptot = J2d&D P roo (d) where D = {<r^ — ~^* a', \ a' f = ay}. 
We say that p tot is the probability of observing the result a f for the query o~i. 

4-3 Ambiguity 

Some programs are ambiguous in the sense that they do not define a unique prob- 
ability distribution over the possible end states. Consider the following example: 

0.5 ?? a <=> b. 
0.5 ?? a <=> c. 

Suppose the query is "a" . If we use an execution strategy that starts with the first 
rule, then with 50% chance this rule is applied and we get the final result "b" , with 
50% chance the second rule is considered resulting in "c" with a probability of 25%, 
and when no rule is applied the result is "a" with a probability of 25%. However, 
if we use an execution strategy that considers the second rule first, then we get a 
different distribution: "c" has a probability of 50%, and "b" a probability of 25%. 
A program is unambiguous if the probability of an observation does not depend on 
the execution strategy. The program in the above example is ambiguous in general, 
but it is unambiguous w.r.t. the refined strategy class. Under the refined semantics, 
the first rule is always considered first, thus the above program defines only the 
first probability distribution on final states. In general, we define ambiguity w.r.t. 
a strategy class — if the strategy class is omitted, we assume it is the most general 
strategy class corresponding to all execution strategies that instantiate uj\ ■ . 
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Definition 4-5 (unambiguous program) 

A CHRiSM program V is unambiguous (w.r.t. a strategy class fi) if, for all states 

<Ji,af £ S and all execution strategics -», -> e fl, we have: 

if °"' ;!.P > Cr / and °"' P i l.T' * <J f tnCI1 Pi = ^2- 



The distribution semantics (w.r.t. strategy class Q) of an unambiguous (w.r.t. fi) 
CHRiSM program is defined for every query Q as the probability distribution over 
the equivalence classes of final states of derivations (of O). 

Without specification of an execution strategy, ambiguous CHRiSM programs do 
not have a well-defined meaning — they don't define a unique probability distribu- 
tion over the final states, but several distributions, depending on which execution 
strategy is used. Ambiguity can be reduced by using a more instantiated strat- 
egy class. The current CHRiSM system uses the refined semantics. Many programs 
that are ambiguous in general are unambiguous w.r.t. the refined strategy class, 
but not all of them. As a counterexample, consider the program consisting of the 
rule "0.5 ?? a, b(X) <=> c(X)" with the query "b(D , b(2), a" . There are two 
ways to execute this program in the refined semantics: one in which the rule in- 
stantiation "a, b(l)" is considered first, and one in which the rule instantiation 
"a, b(2)" is considered first. According to the first execution strategy, the result 
is "c(l), b(2)" with a probability of 50%, "c(2), b(l)" with a probability of 
25%, and "a, b(l), b(2)" with a probability of 25%. According to the second 
execution strategy the probabilities of the first two outcomes are switched. 

Ambiguity vs. confluence. Ambiguity of CHRiSM programs is related to confluence 
(jAbdennadher et al. 1999]) of CHR programs. A CHR program is confluent if for 
every query, all derivations (under the tot semantics) lead to equivalent final states. 
Confluent CHR programs tend to correspond to unambiguous CHRiSM programs. 
For example, programs with only propagation rules are always confluent and un- 
ambiguous. However, confluence and unambiguity do not coincide. For example, a 
program consisting of the rule "a <=> b : . 5 ; c : . 5" is not confluent (because 
for the query "a" it has two non-equivalent final states) but it is unambiguous. 
Vice versa, some programs are confluent CHR programs while they are ambiguous 
CHRiSM programs. For example, consider the following program: 

0.5 ?? a <=> b. 
0.5 ?? a <=> c. 
0.5 ?? c <=> b. 

If we ignore the probabilities and consider this as a regular CHR program, then 
we get a confluent program (all derivations for the query "a" end in the result 
"b"). However, as a CHRiSM program, it is ambiguous. If the execution strategy is 
such that the first rule is considered first for the query "a" , then the probability of 
ending up with the result "b" is 67.5%. Using an execution strategy that considers 
the second rule first, the probability of getting "b" is only 50%. Therefore, the 
probability depends on the execution strategy and the program is ambiguous. 
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5 Implementation of CHRiSM 

The implementation of CHRiSM is based on a source-to-source transformation from 
CHRiSM rules to CHR(PRISM) rules. PRISM is implemented on top of B-Prolog, 
and several CHR systems are currently available for B-Prolog. In ( Sneyers et al. 2009) 



we presented a prototype implementation of CHRiSM that used a naive CHR(PRISM) 
system based on toychrQ which is a rather naive implementation of (ground) CHR 
in pure Prolog. The current implementation of CHRiSMj is based on the more ad- 
vanced Leuven CHR system QSchrijvers and Demoen 2004 1. 



5.1 PRISM 

PRISM (|Sato 2008[) is a probabilistic logic programming language. It is an exten- 
sion of Prolog with a probabilistic built-in multi-valued random switch (msw). A 
multi-valued switch atom msw(exp, Result) represents a probabilistic experiment 
named exp (a ground Prolog term), which produces an outcome Result. The set 
of possible outcomes for such an experiment is defined by means of a predicate 
values (term, [vl, . . . , vn] ) and term unifies with exp. By default, a uniform 
distribution is assumed (all values are equally likely). Different probabilities can be 
assigned by means of set_sw (term, [pi, ..., pn]). 

A PRISM program consists out of two parts, rules R and facts F. The facts F 
define a base probability distribution Pf on msw-atoms, by means of the values/2 
and set_sw/2 predicates. The rules R are a set of definite clauses, which are al- 
lowed to contain the msw predicate in the body (but not in the head). This set of 
clauses R serves to extend the base distribution P to a distribution Pdb{-) over 
the set of Herbrand interpretations: for each interpretation M of the msw terms, the 
probability Pp(M) is assigned to the interpretation / that is the least Herbrand 
model of R U M (distribution semantics). 

5.2 Transformation to CHR(PRISM) 

The transformation from CHRiSM to CHR(PRISM) is rather straightforward and 
can be done efficiently (linear time). We illustrate it by example. Consider again 
the rule "player(P) <=> choice(P) ?? rock(P) ; scissors(P) ; paper(P)" 
from Section I3TT1 It is translated to the following CHR(PRISM) code: 

values (choice (_) , [1,2,3] ) . 
player(P) <=> msw(choice(P) ,X) , 

(X=l->rock(P) ; X=2->scissors(P) ; X=3->paper(P) ) . 



Another example is the random graph rule from Section 13.21 

eval(3/(N-D) ?? nb_nodes (N) , node(A), node(B) ==> edge(A.B). 
which gets translated to the following CHR(PRISM) code: 



1 by Gregory J. Duck, 2004. Download: http://www.cs.mu.oz.au/~gjd/toychr/ 

2 Download the CHRiSM system at http://people.cs.kuleuven.be/jon.sneyers/chrism/ 
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values (experiment 1 ,[1,2]). 

nb_nodes(N), node (A), node(B) ==> 

PI is 3/CN-l), P2 is 1-P1, set_sw(experimentl, [P1,P2]) , 
msw (experiment 1, X) , (X=l -> edge(A,B) ; X=2 -> true). 

Probabilistic simplification rules and simpagation rules are a bit more tricky 
since it does not suffice to add a "nop" -disjunct like above. The reason is that any 
removed heads are removed from the constraint store as soon as the body is entered, 
and just reinserting the removed heads potentially causes nontermination. Putting 
the msw-test in the guard of the rule also does not work as expected. In sampling 
mode, this works fine, but when doing probability computations or learning, an 
unwanted behavior emerges because of the way PRISM implements explanation 
search. During explanation search, PRISM essentially redefines msw/2 such that it 
creates a choice point and tries all values. This causes the guard to always succeed 
and thus explanations that involve not firing a chance rule are erroneously missed. 
Hence some care has to be taken to translate such rules to PRISM code that behaves 
correctly. The solution we have adopted is to add a built-in to CHR to explicitly 
remove a constraint from the head of a rule. All CHRiSM rules are translated to 
propagation rules. The removed heads are explicitly removed in the body of the 
rule, but only in the branch in which the rule instance is actually applied. 

6 Related Work 

The idea of a probabilistic version of CHR is not new. In (|Fruhwirth et al. 2002|) . a 
probabilistic variant of CHR, called PCHR, was introduced. In PCHR, every rule 
gets a weight representing a relative probability. A rule is chosen randomly from 
all applicable rules, according to a probability distribution given by the normalized 
weights. For example, the following PCHR program implements a coin toss: 

toss <=>0.5: head, 
toss <=>0.5: tail. 

One of the conceptual advantages of PCHR, at least from a theoretical point of 
view, is that its semantics instantiates the abstract operational semantics u>t of CHR 



(Sneyers et al. 20101: every PCHR derivation corresponds to some tot derivation. 

However, the semantics of PCHR may also lead to some confusion, since it is not 
so clear what the meaning of the rule weight really is. For example, consider again 
the above coin tossing example. For the query toss we get the answer head with 
50% chance and otherwise tail, so one may be tempted to interpret weights as rule 
probabilities. However, if the second rule is removed from the program, we do not 
get the answer head with 50% chance, but with a probability of 100%. The reason 
is that the weights are normalized w.r.t. the sum of the weights of all applicable 
rules. As a result of this normalization, the actual probability of a rule can only 
be computed at runtime and by considering the full program. In other words, the 
probabilistic meaning of a single rule heavily depends on the rest of the PCHR 
program; there is no localized meaning. Also, adding weights to propagation rules 
is not very useful in practice. 
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The abstract semantics uj t of CHR can be instantiated to allow more execution 
control and more efficient implementations. However, the PCHR semantics, even 
though it conforms to uj t , cannot be instantiated in a similar way. The reason is 
that the semantics of PCHR refers to all applicable rules in order to randomly pick 
one. This conflicts fundamentally with the purpose of instantiations like the refined 
semantics, which consider only a small fragment of the set of applicable rules, e.g. 
only rules for the current active constraint occurrence. 

The ujj. semantics of CHRiSM differs from that of PCHR. In particular, toy 
derivations do not always correspond to u t derivations (although they do, in a 
sense, correspond to unfinished u t derivations). However, the semantics of CHRiSM 
can be instantiated since chance rules have a localized meaning: the application 
probability does not depend on the set of all applicable rules like in PCHR. As a 
result, it can be implemented efficiently and more execution control can be obtained. 

Another advantage of CH RiSM over PCHR are the features inherited from PRISM, 
in particular probability computation and EM-learning. The existing PCHR imple- 
mentation only supports probabilistic execution, i.e. sampling. 

Probabilistic Logic Programming. There are numerous probabilistic extensions of 
logic programming. One particular family of such extensions is formed by CP- 
logic or LPADs, ProbLog, ICL, and PRISM itself (|Sato 2008|) . All of these can 
be encoded in CHRiSM: in (Sne yers et al. 2009] ) we have shown that CP-logic (of 
which ProbLog, ICL, etc. are sublogics) can be encoded in CHRiSM in a compact 
and modular way. 

Next to these "logic programming flavored" languages, there are also a number of 
formalisms that are inspired by Bayesian networks, such as BLP, RBN, CLP(BN), 
and Blog. Based on the encoding of Bayesian networks that we gave in Section 
1531 we can also translate BLPs to CHRiSM. RBNs, CLP(BN) and Blog would 
be more difficult, because they allow more complex probability distributions, for 
which CHRiSM currently does not offer support. (A more detailed description of 
these formalisms can be found in ([Getoor and Taskar 2007[) .) 

7 Potential Applications 

Both PRISM and CHR have been successfully applied in a wide range of research 
fields. Since the features of PRISM and CHR are largely orthogonal, we can ex- 
pect CHRiSM to be extremely suitable for applications at the intersection of the 
application areas of PRISM and CHR. One example of an application area at 
the intersection is abduction, which has been studied in the context of PRISM 



dSato ancT Kamcya 2002) and also in the context of CHR (Sneyers et al. (20101, 
Section 7.3.2). Computational linguistics and bio-informatics are two other domains 
in which both PRISM and CHR have proven to be very valuable tools (Sato 2008; 
Christiansen 2005; Christiansen and Lassen 2009). 

Furthermore, in many application domains of CHR, there is clearly a potential 
for probabilistic extensions of the existing approaches, for instance to deal with un- 



certain information. Examples are (section numbers refer to Sneyers et al. (20101): 
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scheduling (Section 7.1.1), spatio-temporal reasoning and robotics (Section 7.1.2), 
multi-agent systems (Section 7.1.3), the semantic web (Section 7.1.4), type systems 
(Section 7.3.1), testing and verification (Section 7.3.5). 

Another interesting application area is the automatic analysis and generation of 
music. In the past, we have used PRISM to analyse and generate music in a proba- 
bilistic setting flSneyers et al. 2006] ) . There are also several deterministic approaches 



based on constraints and strict rules (e.g. Boenn et al. (20081). Preliminary results 



indicate that a combined approach, using CHRiSM, is very promising. In this ap- 
plication, sampling of a probabilistic model corresponds to music generation, while 
parameter learning from a training set corresponds to tuning the model to a specific 
genre or composer, and probability computation (or Viterbi computation) can be 
used for music classification. 



8 Conclusion 

In this exploratory paper, we have introduced a novel rule-based probabilistic-logic 
formalism called CHRiSM, which is based on a combination of CHR and PRISM. 
We have defined an operational semantics for arbitrary CHRiSM programs and a 
distribution semantics for unambiguous CHRiSM programs. We have illustrated the 
CHRiSM system by example and we have outlined some potential application areas 
in which CHRiSM can be used. Finally, we have sketched the implementation of the 
CHRiSM system and discussed related languages, in particular PCHR. 

In our opinion, CHR has important advantages over Prolog, including complexity- 
wise completeness and the expressivity of multi-headed rules. We expect CHRiSM 
to have the same advantages over plain PRISM. 

There are several directions for future work. The notion of ambiguity and its 
relation to confluence has to be explored; in particular, the existence of a decid- 
able ambiguity test (for terminating CHRiSM programs). Although the current im- 
plementation is sufficiently efficient for sampling, it is too naive for probability 
computation and learning, since those tasks require an efficient mechanism to find 
explanations (sequences of probabilistic choices) for observations. Improving the ef- 
ficiency of explanation search is the topic of ongoing work (Sneyers 2010). Another 
limitation of the current implementation is that it only supports ground queries 
and observations. Finally, it would be interesting to transfer automatic CHR pro- 



gram generation techniques (e.g. Abdennadher et al. (2006)) to CHRiSM in order 



to obtain a system that supports not only parameter learning but also structure 
learning (rule learning). 
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